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OPTIMAL MODEL SELECTION FOR DENSITY 
ESTIMATION OF STATIONARY DATA UNDER VARIOUS 
MIXING CONDITIONS. 

By Matthieu Lerasle* 

* Institut Mathematiques de Toulouse (UMR 5219), INSA Toulouse. 

We propose a block-resampling penalization method for marginal 
density estimation with non necessary independent observations. When 
the data are /3 or r-mixing, the selected estimator satisfies oracle in- 
equalities with leading constant asymptotically equal to 1. 
We also prove in this setting the slope heuristic, which is a data- 
driven method to optimize the leading constant in the penalty. 

1. Introduction. Model selection by penalization of an empirical loss is 
a general method that includes several famous procedures as cross-validation 
(Rudemo (1982)) or hard thresholding (Donoho et al. (1996)) as shown by 
Barron, Birge and Massart (1999). The difficulty is to calibrate the penalty 
so that the selected estimator satisfies an oracle inequality. A good penalty 
has the shape of an ideal one, see the definition (2.4), and depends in general 
on a leading constant that should be chosen sufficiently large. 
Resampling penalties provide a shape for the penalty term in a general statis- 
tical learning framework, see Arlot (2009). The resulting estimator satisfies 
sharp oracle inequalities in non-Gaussian heteroscedastic regression among 
histograms (Arlot (2009)) and in density estimation among more general 
collections of models (Lerasle (2009b)). The validity of these theorems relies 
on the independence of the observation. In this paper, we study a gener- 
alization of these penalties, called block-resampling penalties and we prove 
that the resulting estimator satisfies sharp oracle inequalities when the data 
are only supposed to be (3- or r-mixing (the coefficient (3 has been defined 
by Volkonskii and Rozanov (1959), the coefficient r by Dedecker and Prieur 
(2005), see Section 2.4). 

We use a coupling method to extend the results for independent data. It 
was introduced in Baraud, Comte and Viennet (2001) in a regression prob- 
lem and used in Comte and Merlevede (2002) for density estimation with 
(3- mixing observations. (3 is a well known "strong" mixing coefficient. We 
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refer to the books of Doukhan (1994) and Bradley (2007) for examples of ft- 
mixing processes. One of the most important is the following: a stationary, 
irreducible, aperiodic and positively recurrent Markov chain is /3-mixing. 
"Strong" mixing coefficients cannot be used to study a lot of simple pro- 
cesses. For example Andrews (1984), the stationary solution of the equation 



where (£ n )nez axe i-i-d. Bernoulli random variables £>(l/2) is not /3-mixing. 
This is why "weak" mixing coefficients as r, have been introduced. They 
are easier to compute and allow to cover more examples, as the process 
(1.1) (see Dedecker and Prieur (2005), Comte, Dedecker and Taupin (2008) 
or the book Dedecker et al. (2007) for examples of weakly-mixing processes). 
In Lerasle (2009a), we used a coupling result of Dedecker and Prieur (2005) 
to extend the coupling method to r-mixing data. 

In these papers, the dimension of the models was used as a shape of the 
penalties, but the leading constant was built with the mixing coefficients. 
They cannot in general be computed from the data and even if they are, 
the theoretical upper bounds obtained are probably too pessimistic to be 
used by the statistician. The slope algorithm allows to calibrate this leading 
constant in an optimal way. It is based on the slope heuristic, introduced in 
Birge and Massart (2007) and proved in Birge and Massart (2007) for Gaus- 
sian regression, in Arlot and Massart (2009) for non-Gaussian heteroscedas- 
tic regression over histograms and in Lerasle (2009b) for density estimation. 
The second main result of this paper is a proof of the slope heuristic for the 
marginal density estimation problem with ft- or r-mixing data. 
Block-resampling penalties and the slope heuristic can be defined in a more 
general statistical learning framework, including the problems of classifi- 
cation and regression (this framework is the one of Massart and Nedelec 
(2006)). Our results are contributions to the theoretical understanding of 
these generic methods. Up to our knowledge, they are the first ones ob- 
tained in a mixing framework. 

The paper is organized as follows. Section 2 introduces the density estima- 
tion framework, the estimators, the penalties and the main assumptions. 
Sections 3 and 4 give the main results, respectively for r- and /3-mixing pro- 
cesses. Section 5 gives the proofs of the main results. Some other proofs are 
available as supplementary material (Supplement A). 

2. Preliminaries. 

2.1. The density estimation framework. We observe n real valued, identi- 
cally distributed random variables X±, ...,X n , defined on a probability space 




X n — -(X n -x + £ n ) 
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(f2,.A,P), with common law P. We assume that P is absolutely continuous 
with respect to the Lebesgue measure /jouK and we want to estimate the 
density s of P with respect to fi. L 2 (fi) denotes the Hilbert space of square 
integrable real valued functions and ||.|| the associated L 2 -norm. We assume 
that s belongs to L 2 (fi). The risk of an estimator 's of s is measured with 
the L 2 -loss, that is \\s — s|| 2 , which is random when s~is. 
Let p, q be two integers and assume that n = 2pq. For all i = 0, ...,p — 1, let 
Jj = (2iq + 1, (2i + l)q), A { = (Xi) l£h . For all functions t in L l {P), for 
all reals x±,...,x q , we define 

1 q f 
L q t(x 1 ,...,x q ) = -} t(xi), Pt= t(x)s(x)dfi(x), P A t = -} L q t(Ai). 

Given a linear space S m of measurable, real valued functions, and an or- 
thonormal basis (V^AeA™ of S m , we define the projection estimator sa, ot of 
s onto S m by 

SA,m = {Pa^\)^x = arg min { ||t|| 2 - 2P A t } . 

AGAm 

Given a finite collection {S m ) mt =M n °^ such linear spaces and a penalty 
function pen : Ai n — > M + , the Penalized Projection Estimator, hereafter 
PPE is defined by 

(2.1) sa = SA.m) where m G arg min ||sA, m || 2 - 2P J 4SA,m + pen(m). 

mGAIn 

We will say that the PPE satisfies an oracle inequality when one of the two 
following inequalities holds. 

There exist constants k > 0, 7 > 1 and a positive sequence (-fQi)neN* 
bounded away from such that 

|2 / -t ll„ - II 2 ^ - - K 



(2.2) P LPr„||s-5A|r < mf \\s -s A , m \\ z ) >l - —. 
There exists a positive sequence (K n ) n€ ^* bounded away from such that 

(2.3) K n E(\\s-s A \\ 2 ) <E( inf \\s - s A , m \\ 2 J ■ 

\mGA4n / 

The oracle inequality is said to be sharp when, moreover, the sequence K n — > 
1 when n grows to infinity. Inequalities (2.2) are usually preferred to (2.3) 
since they describe the typical behavior of the selected estimator and not 
only of its expectation. 

It is worth mentioning that we only use Card(U?TQ Jj) = pq = n/2 data to 
build the estimator sa- The consequences of this choice are discussed after 
Theorem 3.1 and in Section 4.3. 
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2.2. Block-resampling penalties. We introduce block-resampling penal- 
ties as natural generalizations of resampling penalties. The best estimator 
in the collection (s A m )meM n minimizes among M. n the ideal criterion 

||s - s A ,m\\ 2 ~ \\s\\ 2 = ||sA, m || 2 - 2P A s A>m + pen id (m). 

In this decomposition, the ideal penalty pen id (m) (Arlot (2009)) is equal to 

(2.4) pen id (m) = 2{P A - P)(s A ,m)- 

To adapt the approach of Arlot (2009) to a dependent setting, we replace the 
resampling step by a resampling procedure on the blocks (-Ai)i=o,...,p-i- Let 
(Wo, Wp-i) be a resampling scheme, that is, a vector of positive random 
variables, independent of {Xi)i=\ n an< i exchangeable, which means that, 
for all permutations £ of {0, ...,p — 1}, 

(W^(o), W^(p_i)) has the same law as (Wo, W p -\). 

Let W = p' 1 YaZq Wi, for all t in L X (P), let Pf be the block-resampling 
empirical process defined by 

1 p ~ l 

P\ V t=-Y j W i L q t{A i ). 

For all integrable random variables F{X\, ...,X n , Wo, W p -i), let 

E W [F{X 1 , ...,X n , W , Wp-i)] = E[F(X 1: ...,X n , W , W^)^, ...,X n }. 

Let ((^ A ) A6A m ) m g^4 n be orthonormal bases of (S m ) rn ^M n and let (s AjTn )meMr, 
be the collection of resampling projection estimators 

AeA m 

The block-resampling penalties are defined as block-resampling estimators 
of the ideal penalty by 

(2.5) pen w (m, C) = CE W (2(Pf - WP A )(s%J) . 

The idea of resampling is to mimic the behavior of the empirical process 
P A around P by the behavior of the resampling empirical process P\^ 
around WP A . The resampling procedure is a plug- in method where the 
unknown functionals F(P,P n ) are estimated by F(WP n , P^). Hence, s^.m 
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in pen id (m) is replaced by Sj[ m in pen w (m, C) and, instead of applying the 

process Pa — P, we apply the process P^ — WPa- We take the expecta- 
tion with respect to the distribution of the resampling scheme to stabilize 
the procedure. Finally, we let a normalizing constant C free for this general 
definition. 

We use a block-resampling scheme instead of a classical exchangeable re- 
sampling scheme in order to preserve the dependence of the data inside the 
blocks. This is a key point for the procedure to work. Examples of resam- 
pling schemes can be found in Arlot (2009). The classical block-bootstrap, 
Kiinsch (1989); Liu and Singh (1992) is obtained when the distribution of 
(Wo, Wp-x) is the multinomial M. (p, 1/p, 1/p). 

2.3. The Slope Algorithm. The "slope heuristic" has been introduced 
by Birge and Massart (2007) in order to calibrate the leading constant in 
a penalty term (for example the constant C in (2.5)). It is based on the 
behavior of the complexity of the selected model (recall the definition (2.1)). 
It states that there exist a family (A m ) m6 ^j n and a constant K m \ n satisfying 
the following properties. 

SHI When pen(m) < KA m , with K < K m i n , then A^ > c\ max me _A/f n A m . 
SH2 When pen(m) = KA m , with K > K m { n , then A^ is much smaller. 
SH3 When pen(m) = 2K m i n A m , then satisfies an optimal oracle in- 
equality. 

Based on this heuristic, Birge and Massart (2007) introduced the follow- 
ing slope algorithm. It can be used in practice when a family (A fn ) me ^vi n 
satisfying the slope heuristic is known. 

• For all K > 0, compute A^ K ^ where m(K) is defined as in (2.1) with 
pen(m) = KA m . 

• Find K such that A^fR) 1S very large for K < K and much smaller 
when K > K. 

• Choose the final rh equal to m(2K). 

The idea is that K ~ K m i Q since we observe a jump of the complexity of 
the selected model around K = K (thanks to SHI, SH2) and thus that 
the final estimator, selected by the penalty 2KA m ~ 2K m j n A m , satisfies an 
optimal oracle inequality (by SH3). 

2.4. Some measures of dependence. 

2.4.1. f3-mixing data. Volkonskh and Rozanov (1959) defined the coeffi- 
cient (3 as follows. Let Y be a random variable defined on a probability space 
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(ft, A, P) and let M be a cr-algebra in A, let 



p(M,a(Y))=E (sup\F YlM (A)-V Y (A)\ 

For all stationary sequences of random variables (X n ) n ^i denned on (ft, A, P), 
let 

P k = P(a(X i ,i<0),a(X i ,i>k)). 
The process (X n ) n ^z is said to be /3-mixing when f3 k — > as k — > oo. 

2.4.2. r-mixing data. Dedecker and Prieur (2005) defined the coefficient 
r as follows. For all I in N*, for all x,y in M 1 , let di(x,y) = Yl%=i \ x i ~ Vi\- 
For all I in N*, for all functions t defined on PJ, the Lipschitz semi-norm of 
t is defined by 

, . \t(x) - t(y)\ 

Lip,(i) = sup 1 \> \ yn . 

For all functions t defined on R, we will denote for short by Lip(t) = Lip 1 (t). 
Let Ai be the set of all functions t : M. 1 — > R such that Lip^(i) < 1. For all 
integrable, M'-valued, random variables Y defined on a probability space 
(ft, A, P) and all cr-algebra M in A, let 



t(M,Y)=M su P |Py| M (t)-Py(t)| 

For all stationary sequences of integrable random variables (X n ) n ^i defined 
on (ft,A,¥), for all integers k,r, let 

T kyT = max y sup {t(ct(X p ,p < 0), (X h , X h ))}, r k = sup r fc>r . 

l<l<r i fc<ii<..<i ; r£N* 

The process (X n ) ne z is said to be r-mixing when r k — > as k — > oo. 
2.5. Main Assumptions. 



2.5.1. ^4 specific collection for r-mixing sequences. Wavelet spaces have 
been widely used in density estimation, in particular, because the oracle is 
adaptive over Besov spaces (see Birge and Massart (1997)). 

Dyadic Wavelet spaces: Let r be a real number, r > 1. We work with an 
r-regular orthonormal multiresolution analysis of L 2 (/i), associated with a 
compactly supported scaling function eft and a compactly supported mother 
wavelet tp. Without loss of generality, we suppose that the support of the 
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functions (ft and ip is included in an interval LAi,^) where A\ and A2 
are integers such that Ai — A\ = A > 1. For all k in Z and j in N*, let 
tpo^k ■ x — > \/2(j)(2x — k) and ipj jk : x — > 2^ 2 ip{2^x — k). The family 
{(V'j,fc)i>o,A;ez} is an orthonormal basis of L 2 (/i). The collection of dyadic 
wavelet spaces is described as follows. 

[W] dyadic wavelet generated spaces: let J n = [log 2 (n)], for all J m = 
1, . . . , J n , let 

Am = {&*:), 0<j<J m ,k€Z} 
and let S m be the linear span of {V'aIagA™. • 

2.5.2. General framework:. We present in this section a set of hypothesis 
sufficient to prove the theorems. None of them is used to build the penalties. 

HI : There exists a constant n a such that, for all m, m' in A4 n , for all 
t in S m + S m i , with \\t\\ < 1, there exist t m in S m and t m > in S m >, with 
\\~tm\\ V ||im'|| — K a such that t = t m + t m i . 

This assumption is typically satisfied for nested collections as [W]. 

H2 : N n = Card(7W n ) is finite and there exist constants c_m, oim such that 
N n < c M n aM . 

This assumption means that the collection is not too rich and thus, that the 
model selection problem is not too hard. It is satisfied by the collection [W] . 

Let us introduce some notations. For all m in M n , for all orthonormal bases 
(^x)xeA m of S m , let 

D A , m = q ^2 Var GM^A)04o)), RA,m = n\\s - s m \\ 2 + 2D A)m , 
AeA m 

B m = {t G S m , \\t\\ < 1}, b m = sup . 

DA,m, and thus Ra,™,, are weu defined since we can check with Cauchy- 
Schwarz inequality that 



D A , m = gE 



^ 2 

sup Lqt(A ) - Pt ' 



Two quantities will play a fundamental role to discuss the results. The first 
one is the risk of an oracle. 

R n = inf R A ,m- 

m£Mn 
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We are typically interested in non parametric problems where R n /n ~ n~ 7 
for some < 7 < 1. This situation occurs, for example, when s is a regular 
function, in this case, we have R n /n = Kn~ 2a ^ 2a+1 \ for some a > 0, k > 0. 
We will make the following assumption. 

H3 : There exists a constant kr > such that R n > K^(lnn) 8 . 

The constant 8 in (Inn) 8 is technical, it yields the rate e n = (lnn)~ 1//2 in the 
oracle inequalities. Arlot (2009) replaced this assumption by a lower bound 
on the bias of the models. It implies that R n > ku 1 , for some constants 
k > 0, 1>7>0 and therefore Assumption H3. 

H4 : There exists a constant cu > such that, 

Mm G M n , P I sup t 2 ) > c D b 2 m . 
\teB m J 



It is shown in Section 6 that some classical examples of collections (S m , m G 
M n ) as regular histograms, Fourier spaces and [W] satisfy H4. 
The following assumptions will be used to prove the slope heuristic. We 
introduce a second quantity, that will play a fundamental role. Let 

D* n = max D A>m . 

m€Mn 

In classical collection of models, like [W], D* n ~ cn. This is why we introduce 
the following assumption. 

H5 : D^/Rn 00 when n grows to infinity. 

We will prove that, when the data are mixing DA, m — (\\sm — SA,m|| 2 ) 
represents the variance term of the risk. It is a natural measure of the com- 
plexity of the models. Hence, Z?* represents the maximal complexity of the 
models. Moreover, R n is the risk of the oracle. It balances the complexity 
and the bias term and has therefore the same order as the complexity of 
an oracle. Hence, Assumption H5 means that the largest complexity in the 
collection (S m ) m ^Mn 1S much larger than the one of an oracle, which is a 
natural condition for the slope heuristic to hold. We need a final assumption. 

H6 : For all m* such that Da^ttl* = D n , we have 

II l|2 

n\\s s m * 

— > when n — > 00. 

D* 

When D* n is of order n, H6 simply means that the distance between s and 
a complex model goes to 0. In general, it means that for these complex 
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models, the bias part of the risk is negligible compared to the variance part. 
We conclude this section by the assumptions on the mixing coefficients. All 
mean that these coefficients are sufficiently small. Let 7 = j3 or r. 

[AR(0)] arithmetical ^-mixing with rate 9: there exists C > such that, for 
all k in N, j k < (7(1 + k)-( 1+0 h 

S(f3) : X];>iO + 1)A < cd/64, where cd is defined in H4. 

We prove in Appendix that cd = 1 for regular histograms and Fourier spaces. 

/ 2 \l/3 

S(r,W) : z2i>i ( INI T i ) — C(W), where C{W) depends only on (f>, tp. 

The value of the constant C{W) is given in the supplementary material Supplement A. 

3. Results for r-mixing sequences. 

3.1. Resampling penalties. The result of this section is that PPE selected 
by block-resampling penalties satisfy sharp oracle inequalities. 

Theorem 3.1. Let Xi,...,X n be a strictly stationary sequence of real 
valued random variables with common density s and let {S m )m£M n be a 
collection of regular wavelet spaces [W] satisfying H3,H4. Let p, q be two 
integers such that 2pq = n and ^^/n(lnn) 2 < p < y / n(lnn) 2 . 
Let C w = Var(Wi - W)' 1 , C > C w /2 and let s A be the PPE defined in 
(2.1) with the penalty pen w (m,C) defined in (2.5). 

Assume that there exists 9 > 5 such that X%, ...,X n are arithmetically [AR(#)] 
r-mixing and satisfy S(r,W). Let e n = (Inn) -1 / 2 , k(C) = 2 i^CC^ ~ 1 
There exist constants K\, K2 such that we have 

(3.1) ^ n E(|| S -s A || 2 ) <e( inf \\s-s A ,m\\ 2 ) + —, 



meMn I n 



with 

= (1 A (1 + k(C)) - Kie n 

n (1 V(1 + K(C)) + «!£„" 



Comments: 



The constant C has to be chosen asymptotically equal to C\y. However, 
from a non-asymptotic point of view, it is helpful to choose C slightly 
larger. A first reason is that (3.1) is useless when K n < 0, which occurs 
if K\e n > 1 and C = Cw- 
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3.2. Slope heuristic. Theorem 3.1 gives a totally data driven penalty 
which satisfies a sharp oracle inequality, therefore, the heuristic is not nec- 
essary to obtain asymptotically optimal results. However, we saw that C can 
be optimized for small samples. Moreover, the slope algorithm is faster to 
compute than the resampling penalties when a deterministic quantity can 
be used in the slope heuristic. Theorem 3.2 hereafter justifies property SHI 
of the heuristic. A m is the variance term D^ m /n and K m [ n = 2. 

Theorem 3.2. Let Xi,...,X n be a strictly stationary sequence of real 
valued random variables with common density s and let (S m ) m ^M n be a 
collection of regular wavelet spaces [W] satisfying H3, H4, H5, H6. Letp, 
q be two integers such that 2pq = n and ^y/n(lnn) 2 < p < y/n{\n.n) 2 . 
Assume that there exists a constant < 5 < 1 such that , for all m in M n , 

(3.2) < penim) < (2-5)— — , 

n 

and let s~a be the PPE defined in (2.1). 

Assume that there exists 9 > 5 such thatXi, ...,X n are arithmetically [AR(#)] 
t -mixing and satisfy S(r,W). There exist constants k\, K2 such that 

45 

(3.3) E(Z> A ,,fc)>-££-Ki. 



(3.4) E - ^|| 2 ) > (E ( inf || fl - <r A , m || 2 ) - ^) . 

Comments: 

• Inequality (3.3) states that -Da,™, is as large as possible when the 
penalty term is too small. This is exactly SHI with A m = DA,m- 

• Inequality (3.4) states that the model selected by a too small penalty 
is never an oracle. This is another reason why it is interesting to choose 
C > Cw in Theorem 3.1. 

The following theorem justifies properties SH2, SH3 of the slope heuristic. 

Theorem 3.3. Let Xi,...,X n be a strictly stationary sequence of real 
valued random variables with common density s and let (S m ) m& Mn be a 
collection of regular wavelet spaces [W] satisfying H3, H4. Let p, q be two 
integers such that 2pq = n and ^y/n(lnn) 2 < p < y / n(lnn) 2 . 
Assume that there exist (5+ > —5- > —1, e > and some constants k±, K2 
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satisfying, for all m in A4 n 
(3.5) E 



sup ( (2 - S-) 



2D A , 



n 



(3.6) 



E 



sup ( pen(m) — (2 + 8+) 

meMn 



pen(m) 

D Am 



n 



RA,m \ 


< 


«1 


n ) +. 




n 


RA,m\ 


< 




n )+. 




n 



Let s~a be the PPE defined in (2.1) with pen(m) and let e n = (Inn) -1 / 2 . 
Assume that there exists 9 > 5 such that Xi, ...,X n are arithmetically [AR((9)] 
t -mixing and satisfy S(r,W). There exist constants K\, K2, «3, such that 



(3.7) K n E(\\~s A -s\\ 2 ) <E inf \\s - s A , m f )+ — , 



with 

Moreover, we have 
(3.8) 

Comments: 



K„ 



(1 A (1-5-)) -Ki(e ra + g) 
(1 V (I + 8+)) + K!(e n + e) ' 



K n E(D A ^) <Rn + K 3 . 



When pen(m) becomes larger than 2D A ^ m /n, D A ^ jumps from D* 
(3.3) to R n ((3.8) for 8+ and —8- close to —1). This justifies SH2 
since R n « D*. 

A model selected with a penalty 4D A ^ m /n satisfies an oracle inequality 

(Theorem 3.3 for 8+ and 8- close to 0). This justifies SH3. 

D AjTn is unknown and cannot be used in the slope algorithm. It can 

be shown ( see Lemma 5.2 in the supplementary material) that D Am 

satisfies K*2 Jm < D A , m < K*2 Jm . The slope heuristic might hold for 

A m = 2 Jm /n, but a complete proof requires moreover that 

However, we obtain in the proof of Theorem 3.1 that, for C = CV, 

pen w/ (m,C) satisfies (3.5) and (3.6) for 8+ = 8- = and 

Since (3.2) can be modified to work with random penalties, we can 

apply the slope algorithm with pen w (m, 1) instead of D A ^ m /n. 



4. Results for /3-mixing sequences. We show that block-resampling 
penalties select oracles and that the slope heuristic holds in this case. 
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4.1. Resampling penalties. 

Theorem 4.1. Let Xi,...,X n be a strictly stationary sequence of real 
valued random variables with common density s and let (S m ) m ^Mn be a, col- 
lection of linear spaces satisfying HI, H2, H3, H4. Letp, q be two integers 
such that 2pq = n and ^^fniYtm) 2 < p < v / n(lnn) 2 . 

Let C w = VariWi - W)' 1 , C > C w /2 and let s A be the PPE defined in 
(2.1) with the block-resampling penalty pen w (m,C) defined in (2.5). 
Assume that there exists 9 > 2 such that X±, ...,X n are arithmetically [AR(#)] 

(3-mixing and satisfy S(/3). Let e n = (Inn) -1 / 2 , k(C) = 2 ^CC^ ~ 1 
There exist constants k±, k>2 such that, 
(4.1) 



P [ K n \\s - s A \\ 2 < inf || -s - SA,m\\ 2 )> 1 - K2 ( -n V 



1 (Inn) 



4+26* 



with 



_ (1 A(1 + K(C))) -KtEn 
n (1 V (1 + K(C)) + KlS n 



Comments: 



• The coupling lemma of Berbee (1979) for /3-mixing processes is much 
stronger than the one satisfied by r-mixing data (Dedecker and Prieur 
(2005)). This is why Theorem 4.1 covers more collections of models 
than Theorem 3.1 and why we prove oracle inequalities in probability. 

4.2. Slope heuristic. The following theorems are adaptations to the (3- 
mixing case of Theorems 3.2 and 3.3. 

Theorem 4.2. Let Xi,...,X n be a strictly stationary sequence of real 
valued random variables with common density s and let (S m ) m ^M„ be a col- 
lection of linear spaces satisfying HI, H2, H3, H4, H5, H6. Letp, q be 
two integers such that 2pq = n and ^y^lnn) 2 < p < ^/n(lnn) 2 . 
Let s~a be the PPE defined in (2.1) with a penalty pen(m) satisfying, for all 
m in M n , Condition (3.2) of Theorem 3.2. 

Assume that there exists 9 > 2 such that X%, ...,X n are arithmetically [AR(#)] 
(3-mixing and satisfy S(/3). There exists a constant k and an event Q n such 
that 

( 1 (lnn) 4+2e ~ 

p (^)>i-^-,v^-^ 

and, on Vt n , 

(4.2) D Ayn > ^-D* n , \\s - s A \\ 2 > inf ||s - s A , 

9 5 K n meMn 
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Theorem 4.3. Let Xi,...,X n be a strictly stationary sequence of real 
valued random variables with common density s and let (S m )meM n ^ e a 
collection of linear spaces satisfying HI, H2, H3, H4. Let p, q be two 
integers such that 2pq = n and ^^/n(lnn) 2 < p < y / n(lnn) 2 . 
Assume that there exist <5+ > —5- > — 1, e>0, 0<77<1 and an event 
£l P en, with F(£l pen ) > 1 — rj such that, on £l pen , for all m in M n , 

(4.3) (2 — 0-) ■ e — < penim) < (2 + 6+) ■ h e — . 

n n n n 

Let s~a be the PPE defined in (2.1) with pen. 

Assume that there exists 9 > 2 such that X%, ...,X n are arithmetically [AR(#)] 
^-mixing and satisfy S(/3). There exist constants K\, K2 and an event $7* 
such that 

1 (lnn) 4+2r 



p TO> i-v-^ nd/2 

and, on f2*, 

(4.4) K n \\sA - s\\ 2 < inf \\s - s A ,m 



meMn 



with 



K„ 



(1A(1-L)) -Kl(£n+g) 

(lV(l + <y+)) + ASi(e n + e)' 
Moreover, il*, 2K n DA,m < 3i? n . 

Comments: 

• We refer to the comments of Theorems 3.2 and 3.3 where we ex- 
plain why Theorems 4.2 and 4.3 imply the slope heuristic with A m = 

DA,m/n, -fTmin = 2. 

• As in Theorem 3.3, Da,™ cannot be used to build a model selection 
procedure. A deterministic shape of Da,™ is unknown, although we 
prove in the supplementary material that Da,™, is bounded by 6^. 
However, pen w (m, 1) can be used instead of Da,™- 

4.3. Discussion and perspectives. Block-resampling penalties are data 
driven procedures for the estimation of the marginal density of mixing data. 
They satisfy optimal oracle inequalities without remainder term. The the- 
orems hold for possibly infinite dimensional models. Finally the oracle in- 
equalities are sharp and there is no remainder term. This improves Theorems 
3.1 and 4.1 in Lerasle (2009a) and Theorem 3.1 in Comte and Merlevede 
(2002). 
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Lacour (2008) gave a model selection procedure to estimate the stationary 
density and the transition probability of a Markov Chain. She worked with 
a stationary chain, irreducible, aperiodic and positively recurrent, which is 
therefore /3-mixing. Her density estimator is selected by a penalty equal to 
Kd m /n with a constant K that "depends on the law of the chain" (see Re- 
mark 4 after Theorem 3 in Lacour (2008)) and that she proposed to estimate 
in the simulations by the slope algorithm. We prove the slope heuristic, justi- 
fying that the slope algorithm can be used to optimize the leading constant. 
It would be interesting to see if resampling penalties may be used in her 
context to estimate the transition probabilities. 

Gannaz and Wintenberger (2009) worked with other weak mixing coeffi- 
cients (namely A and <f>, see Dedecker et al. (2007) for a definition) and 
studied a wavelet thresholded estimator. The main advantage is that the 
thresholded estimator is adaptive over a larger class of Besov spaces than 
the oracle over the collection [W] (for details about this important issue see 
Barron, Birge and Massart (1999)). The main drawback is that their thresh- 
old is built with the mixing coefficients. 

Block-resampling penalties can be extended to the statistical learning frame- 
work of Massart and Nedelec (2006), where the slope algorithm has already 
been defined (Arlot and Massart (2009)). We believe that these procedures 
perform well in this context but the problem remains open. 
The main drawback of our approach is that we use only n/2 data. More- 
over, the deterministic choice of the number p of blocks is not optimized. 
For example, when the data are geometrically /3-mixing, which means that, 
for some constants 6 > 0, C > 0, /3& < Ce~ ek , choosing p of order n(lnn) -2 
would improve the rates of convergence of the leading constant. An interest- 
ing direction of research would be to provide data-driven choices of p and q 
to improve these rates, and a data-driven choice of blocks to use more data. 
In practice, the computation time is also a very important issue. Actually, 
the conditional expectation is a bit long to evaluate and some efforts have 
to be done in this direction. Things can be improved if we obtain a deter- 
ministic shape of the ideal penalty, as in the independent case, since the 
slope heuristic is faster to compute with a deterministic A m . We obtain up- 
per and lower bounds on pen id , but our inequalities are not sharp enough 
to justify completely the slope heuristic. We can also think of the y-fold 
cross validation penalties defined in Arlot (2008). These penalties are also 
faster to compute than the resampling penalties. They can be viewed as 
resampling penalties defined with non-exchangeable weights. These issues 
are far beyond the objectives of the present paper and will be addressed in 
forthcoming works. 
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5. Proofs. 

5.1. Notations. Recall that p and q are integers such that 2pq = n, 
and that y / n(lnn) 2 /2 < p < y^lnn) 2 . For all k = 0,...,p — 1, let 1^ = 
(2kq + 1, (2k + l)q), A k = {Xi) i£h and / = Ujjljl*. For all t in L 2 (fi) 
and all x\, ...,x q in R, 

L q (t)( Xl ,...,x q ) = - Vt(xi), P A t=-Y,L q (t)(A k ) = - Vt(Ii), 
H i=l y k=o i£l 

v A (t) = {P A -P)(t). 

For all m in A4 n , we denote by (i(^x)xeA m & n orthonormal basis of S m . The 
estimator fs Am associated to the model S m , is defined as 

S Am = ^2 ( P Alp\)lJ)\- 
AgA m 

Classical computations show that, if s m denotes the orthogonal projection 
of s onto S m , 

s m = ^2 (^a^a, hence \\s Am - s m \\ 2 = ^ (v A ipx) 2 - 
AeA m AeA m 

The ideal penalty, 2i/ A (s AtTn ) satisfies 

v A (s A , m - s m ) + v A (s m ) = ^2 {va^x) 2 + v A (s m ) = \\s A , m - s m \\ 2 + v A (s m ). 

AGA m 

For all m, m' in M n , let 

p(m) = \\s m - s Am \\ 2 = sup (u A (t)) 2 = ^2 (va(iP\)) 2 - 

t£Bm AeA m 

5(m,m') = 2u A (s m - s m <). 
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Hereafter Wq, ...,W p -i denotes a resampling scheme, W = P -1 X)iLo 
Pj^ denotes the resampling empirical process, defined for all measurable 
functions t by 



1 

Pft = -Y,W i L q t{A i ). 



i=0 



We introduce also = Pf - WP A and C w = (Var(Wi - W))~ l . For any 
orthonormal basis ('4 , x)\eA m of S m , let 

p w (m) = C W ^E W ((^(^ A )) 2 ) • 

AgA m 

pw(m) is well defined since, from Cauchy-Schwarz inequality, 

Pw(m) = Cw^w ( sup (y%t) 2 
\teB m 

Let e n = (Inn) -1 / 2 and let k > 0. Let M denote one of the set M n or M 2 n . 
When M = M n , for all m in M let Ra,wi = RA,m an d when M = M^, for 
all m = (m, m') in Ai, let RA,m = RA,m V RA,m'- For all minA4, let 



(5.1) 
(5.2) 
(5.3) 
(5.4) 
(5.5) 



fi(m,K 

f 2 (m,K 



p(m) 
2D A , 



2DA,m Ra, 



n 



n 



p(m) - Ke n 



p(m) -p w (m) - KE n 
pw(rn) -p(m) - ne n 



5(m, m!) — Ke, 



n 

RA,m 

n 

RA,m 

n 

RA,m 
n 

RA,m V Ra,t 
n 



We will use the following fact. 
Fact 0: The resampling penalty pen w (m,C) defined in (2.5) satisfies 

pen w (m,C) = ICC^-pwijn). 

Proof: Let (V ; A)AeA m be an orthonormal basis of S m . Recall that s^Y 
EagaJ-PTVMV'a, so that 



SA,m ~ Ws A ,m = Yl ( U A^x)^X- 



AeA tl 
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Hence, v %@£ m - Ws A>m ) = Ea 6 A>]JVa) 2 _ 

We conclude the proof showing that Ew [y^ (WsA,m)) = 0, hence 



Pwijn) 



Q w WCSXm " WW)) = ^ MfC^j) = Pen ^' C) . 

Since Wo,..., W p _i are independent of Xi,...,X n , 

1 p ^ 

Eh/ (^(WSXm)) = - ^ Lqi^iAJLqMiA^EwiWiW - (W) 2 ). 

P i,j=0 

Then, by exchangeability of the weights, 

E w (WiW - (W) 2 ) = - I E(W^ 2 ) +^E(WWj 



1 \ i i¥=3 J 



V 

5.2. Proof of Theorem 3.1. The proof is based on the following Lemma, 
whose proof is given in the additional material. 

Lemma 5.1. Let X±, ...,X n be a strictly stationary sequence of real valued 
random variables with common density s and let {S m )m£M„ be a collection 
of regular wavelet spaces [W] satisfying assumptions H3,H4. Let p, q be 
two integers satisfying 2pq = n and ^\/n(mn) 2 < p < ■y/n(\nn) 2 . 
Assume that there exists 9 > 5 such that X\, X n are arithmetically [AR(#)] 
T-mixing and satisfy S(r,W). There exist constants K\, k<i, such that, for 
all i = 1, 5, for all m in M, 

(5.6) e( sup (/i(ro,«i)) + ) <— ■ 



It comes from Fact and the equality 2CCw = «(C) + 2 that, for all m 
in M n , 

(5.7) pen w (m, C) - (2 + n{C))p{m) = 2CC^{p w {m) - p{m)). 

Hence, from (5.6) with i = 3,4, pen w (m,C) satisfies Conditions (3.6) and 
(3.5) of Theorem 3.3 with <5+ = —5- = k(C) and e = 2K\CC^e n . Theorem 
3.1 follows from (3.7). 
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5.3. Proof of Theorem 4-1- The proof is based on the following lemma 
whose proof is given in additional material. 

Lemma 5.2. Let 6 > 1 and let (X n ) ne z be an arithmetically [AR(#)] 
^-mixing process satisfying S(/3). Let (S m ) m ^M n be a collection of linear 
spaces satisfying Assumptions HI, H2, H3, H4. Letp, q such that 2pq = n, 
1 /ra(lnn) 2 /2 < p < y/n(\ nn) . There exist constants K\, K2 which may vary 
from line to line such that, on an event £l n satisfying 

, . / (lnn) 2 ( 1+e ) 1 \ 

P (0„) > 1 - k 2 j /2 V - 2 , where 



(5.8) vmeM, Vz = i,...,5, fi(m) < o. 

Hence, from (5.6) with z = 3,4, pen W /(m,C) satisfies Condition (4.3) of 
Theorem 4.3 with 5 + = —5- = k(C) and e = 2K\CC^e n . Theorem 4.1 
follows from (4.4). 

5.4. Proof of Theorem 3.2 and 4-%- It is sufficient to prove the results 
for sufficiently large n since we can increase the constant K2 if necessary. 
Let m be a model such that RA,m = Rn- Now, by definition, m minimizes 
among Ai n the following criterion: 

Crit(m) = ||s,4, m || 2 - 2P A s A , m + pen(m) + ||s|| 2 + 2v A (s mo ). 

Fact 1: For all m in M. n , 

Crit(m) = \\s m - s\\ 2 + pen(m) - p(m) + 2v A {s mo - s m ). 

Proof: Recalling that \\s — 's At m\\ 2 = ||sk,m|| 2 — ^P^A,rn + ||s|| 2 and that 
{Pa - P){sA,m - s m ) = \\sA,m - s m \\ 2 = p(m), we have, 

Crit(m) = \\s - s A ,m\\ 2 - 2v A {sA,m - s m ) + 2v A {s mo - s m ) + pen(m) 

= (\\s - s A ,m\\ 2 ~ \\s A , m - s m \\ 2 ) - p(m) + pen(m) + 2v A (s mo - s m ). 

We conclude the proof with Pythagoras equality. □ 
Fact 2: For all m in M n , for all constants K\, 

2D a 

(1 + 2Kie n ) — — — > -Crit(m) + (1 - 2Ki£ n ) ||s - s m || 2 

- sup (/i(m,Ki)) - sup (/ 5 ((m,m'),Ki)) . 
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Proof: From Fact 1, for all m in Ai n , for all k\, since pen(m) > 0, 

Crit(m) > \\s m - s\\ 2 - /i(m,«i) - 2L>A > m - 2K 1 e n — A ^ I! - - / 5 ((m , m), «i). 

n n 

We conclude the proof using that i?A,m = ^||s — s m || 2 + 2Dji jm .U 

Fact 3: For all m in M n , for all constants K\, 

(5 - 4 Kl e n )^^ < -Crit(m) + (1 + 2 Kl e n ) \\s - s m \\ 2 
n 

+ sup (/2(m,Ki))+ sup (f 5 ((m,m'),Ki)) . 

m&Mn {m,m')£M% 

Proof: From Fact 1, for all m in M n , for all m, since pen(m) < (2 — 

Crit(m) < ||s m - s|| 2 + h(m, «i) - 5 A,m + 2ni£ n RA,m + f 5 ((m,m ), k\). 

n n 

We conclude the proof using that Ra,™, = n\\s — s m \\ 2 + 2Z? J 4 )jn .D 
From Fact 2, we have, for all «i, 

(1 + 2Kie n ) — > — Crit(m) + (1 — 2«i£ n ) lis — Sm|| 2 
n 

- sup (/i(m,Ki)) - sup (/ 5 ((m,m'),Ki)) . 

Let us now consider a model m* such that D^^m* = D^. By definition of m, 
we have Crit(m) < Crit(m*). Hence, from Fact 3, we deduce that, 

(1 + 2Kie n ) — Aj ^- > — Crit(m*) + (1 — 2Ki£ n ) lis — Sm|| 2 
n 

- sup (/i(m,Ki))- sup (f b {(m,m'),K{j) . 
meMn (m,m')eM* 

(5.9) 

>(6- 4KlE n ) + (1 - 2Ki£ n ) ||s - Srh\\' 2 ~ (1 + 2Ki£ n ) ||s - S 



DA,m* „ n .I 1 1 2 /i i o , r I II 2 



n 

sup (/i(m,«i) + f2(m,Ki) + 2/ 5 ((m,m'),«i)) 

(m,m')eA / |2 

From Lemma 5.1, there exist ki and «2 such that 



E { sup (/i(m,/ci) + / 2 (m,« 1 ) + 2/ 5 ((m,m / ),«i)) ) < — • 
\ (m,m')eA^2 / n 
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From Lemma 5.2, there exists k% such that, on O n , 

sup (/i(m,«i) + / 2 (m,K 1 ) +2/ 5 ((m,m'),Ki)) < 0. 

Now, assume that n is sufficiently large to ensure that 

(5 1 n||s - s m * || 2 25 
1 n - 4 - 4' D* - 9 

Then, taking the expectation in (5.9), we obtain that 

9E(AmQ > «4 
8n ~~ 2 n n 

Hence (3.3) is proved for n sufficiently large. 
Moreover, on O n , we have 

8n ~ 2 n ' 

Hence the first inequality of (4.2) is proved for n sufficiently large. 
(3.4) and the second inequality of (4.2) follow from the inequality 

n — ii 2 — /-i \ ^-A,rh e \ 
\\s-sa\\ >(l-«ie w J J2(m,«i). 

From Lemma 5.1, there exist constants m, K2, such that E(/2(m,Ki)) < 
K2/n. We choose n sufficiently large to ensure that n\e n < 1/2, we use (3.3) 
and we obtain that there exists a constant k such that 

E(\\s-s A f)> 2 4^- 

We conclude the proof of (3.4) with the following Fact. 
Fact 4: 



Rn . 16 ( , - ,. ^ |,2 ^ « 

— > t=E mf ||s - sa, 1 



n 17 \mGA4„ ' / n 

d: . 16A* ^ ,, 2 \ 



thus^> — -a E inf || S -3A, m 

n 17it n \ / n 

Proof of Fact 4. Let «i, be the constant previously defined, 

inf \\s - s A , m \\ 2 < (1 + Ki£ n ) inf J fiA - m 1 + sup f^m,^). 
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We conclude the proof with Lemma 5.1.D 

We use the first inequality of (4.2) and we obtain that, on Q n , 

s-s A 2 >-^. 

9 n 

We conclude the proof of Theorem 4.2, saying that, on f2 n , we have 



Rn 

n meMn 



inf { ||g - s m \\ 2 + 2DA > m 1 > (1 - Ki£ n ) inf { \\s - s m \\ 2 + p(m) } 
eM n l n J meM n 

15 

= (l-/«ie n ) inf ||s - ?A,.m|| 2 > — inf ||s - m || 2 . 

meMn lb meMn 

Thus, 

2 25 D* n Rn 5D* n ^ a 

9 /l n n 9 _K n mG.M„ 

5.5. Proof of Theorems 3.3 and 4-3. As in the previous proof, it is suf- 
ficient to obtain the results for sufficiently large n. Let us first prove the 
oracle inequalities. Let k\ be a constant to be chosen later. Let £l n be the 
set defined on Lemma 5.2. The key point to prove oracle inequalities is the 
following fact. 

Fact 5: For all m in Ai n , for all real numbers <5_, <5+ and for all non negative 
reals x,y, 

[(1A(1 - 6-)) -x- y)]\\s - s A \\ 2 < [(1 V (1 + 5+)) + x + y]\\s- s A , m || 2 

(5.10) + sup {pen(m) - (2 + 5 + )Ws A ,m- s m \\ 2 - x\\s -s A ,m\\ 2 } + 

meMn 

(5.11) + sup {(2- 5-)\\s A ,m- s m \\ 2 -pen(m) - x\\s-s A ,m\\ 2 } + 

meMn 

(5.12) +2 sup {vA(s m/ - s m ) - y(\\s -s A ,m\\ 2 + \\s -s A ,m'\\ 2 )} + - 

(m,m')eM n 

Proof: By definition of 5a, for all m in M n , we have, 

\\§a\\ 2 - 2P a sa + pen(m) + ||s|| 2 < ||sA, m || 2 - 2P A SA,m + pen(m) + ||s|| 2 . 
Now, for all m in M n , since ||s*A,m ~~ s l| 2 = ||sA,m|| 2 — 2P'sA,m + ||s|| 2 > 
|| s A,m|| 2 ~~ 2PASA,m + || s || 2 = || s A,m ~~ s l| 2 ~~ 2(Pa — P)sA,m- 

Thus, for all m in A4 n , 

\\sa- s\\ 2 -2(P A - P)s A + pen(m) < \\s A ,m - s\\ 2 - 2(P A - P)s A , m + pen(m). 



22 



M. LERASLE 



For all m in M n , since {Pa — P)(sA,m — s m ) = ||*k,m ~~ s l| 2 ) 
2 (-Pa ~~ P)sA,m = 2||s m — SA,m|| 2 + 2(Pa — P)s m . 

This yields 

\\s - s A \\ 2 < \\s - SA,m\\ 2 + pen(m) - 2||sA,m - s m \\ 2 

+ 2||sA,m - Sfh\\ 2 - pen(m) + 2v A {s m - s m ). 

We add —[(<$_ V 0) + (x + J/)]||sa — s|| 2 to the left hand side of the previous 
inequality and — ~~ s ml| 2 — {x + y)\\s~A — s\\ 2 + [(<$+ V 0) + x + y] \\s — 

?A,m|| 2 — ^+||sA,m — s m || 2 — (x + y)\\s — ?A,m|| 2 to the right hand side. This 
is valid because, for all m in M n , for all reals 5, 

[(5 V 0) + x + y]\\sA, m - s\\ 2 > 5\\s A , m - s m \\ 2 + (x + y)\\sA,m - s\\ 2 - 
We obtain 

[(1 A (1 - <*_)) - x - y)]\\s - s A f < [(1 V (1 + 5+)) + x + y]\\s- s A , m f 

+ pen(m) - (2 + 5 + )\\s A ,m - s m \\ 2 - x\\s A ,m - s\\ 2 
+ (2 - 5-)\\s A ,fh ~ SfhW 2 ~ pen(m) - x\\s A ,m ~ s\\ 2 

+ 2^A(Sm - S m ) - y\\sA,m ~ s \\ 2 ~ x \\sA,fh ~ s\\ 2 .U 

We will also use the following fact. 

Fact 6 : For all reals k such that KE n < 1/2, 

—^fH < 2||s - s A ,m\\ 2 + 2{/ 2 (m, k)}, . 
n T 

Proof of Fact 6: We write 

RA,m I RA,m „ ^ ■■ 2 PA,m \ 1 n ~ 1 . 2 



n 1 — KE n \ n n J 1 — ne n 

We use that ne n < 1/2 and that -Pa,™. = 2Z?A,m + n\\s — s m \\ 2 to conclude 
the proof. □ 

Control of (5.10). Assume that n is sufficiently large to ensure that K\e n < 
1/2. We have, from Fact 6, 



pen(m) - (2 + 8+)p(m) - 2e \\sA,m ~ s 



2 



< pen(m) - (2 + 6+)p(m) - s ^ A,m + 2e{/ 2 (m, «i)} 

n 
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Applying Lemma 5.1, we obtain constants K\ and k 2 such that 

E( sup {/ 2 (m,Ki)} + ] < — . 

Applying Lemma 5.2, we obtain ct constant k\ sucli that, on 

sup {/2(m,Ki)} + < 0. 

m£Mn 

Moreover, (3.6) ensures that 

E ( sup { pen(m) — (2 + 5 + )p(m) — e ^ A,m 1 ] < -. 
\meMn { n ) + J n 

On rip en we have 

sup I pen(?n) — (2 + 5 + )p(m) — e — j < 0. 

meMn I n J + 

We choose x = 2e. We obtain that, for Theorem 3.1, the expectation of 
(5.10) is upper bounded by m" 1 and for Theorem 4.1, the term (5.10) is 
equal to on Q n n r2 pen - 

Control of (5.11). Assume that n is sufficiently large to ensure that KiE n < 
1/2, we deduce from Fact 6 that 

(2 — 5-)p(m) — pen(m) — 2e ||sA, m — s|| 2 

< (2 — 5-)p(m) — pen(m) — e — + 2e { f2(m, ki) } , . 

n 

Applying Lemma 5.1, we obtain constants k\ and K2 such that 

E( sup {/ 2 (m,Ki)} + ) < — . 
\meMn J n 

Applying Lemma 5.2, we obtain a constant k\ such that, on Q, n , 

sup {/ 2 (m, ki)} + < 0. 

m&Mn 

Moreover, (3.5) ensures that 

E ( sup I (2 — 5-)p(m) — pen(m) — e ^ A,m 1 ) < _ . 
\meMn { n ) ,J n 
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On rip en we have 

sup | (2 — 5-.)p(m) — pen(m) — e — \ < 0. 

We choose x = 2e. We obtain that, for Theorem 3.1, the expectation of 
(5.11) is upper bounded by kji" 1 and for Theorem 4.1, the term (5.11) is 
equal to on VL n Pi Open- 

Control of (5.12). Let m,m' in M n and let m s be the index such that 
RA,m s = RA,m V RA,m' an d let K\ be a constant to be chosen later. Assume 
that n is sufficiently large to ensure that K\£ n < 1/2. It comes from Fact 6 
that 

6(m, rri) = / 5 ((m, m'), «i) + ^ig n — 

n 

< / 5 ((m,m'),Ki) + 2Ki£ n ||sA,m s - s|| 2 + 2Kie n {f 2 (m s , k 1 )} + . 
We deduce from Lemma 5.1 that there exist k\ and k 2 such that 

E sup {5{m,m') -2Kie n (\\s A) rn- s\\ 2 + \\s A)m : - s\\ 2 )} \ 

<E sup {f5((m,m') + 2Kie n f2(m s ,K 1 )} )<—. 
\(m,m')eMl J n 

Applying Lemma 5.2, we obtain a constant k\ such that, on Q, n , 

sup { 5(m, m!) - 2Kie n (||s i 4,m ~ ^H 2 + \\s A ,m' - s\\ 2 ) } < 0. 
(m,m')eM2 

Conclusion of the proofs. We use Fact 5 with x = 2e and y = 2K\e n . We 
take the expectation for the proof of Theorem 3.1, we have obtained that the 
expectation of the remainder terms (5.10)-(5.12) are upper bounded by m" 1 
for a sufficiently large n. For the proof of Theorem 4.1, we have obtained 
that the remainder terms (5.10)-(5.12) with x = 2e and y = 2n\£ n are equal 
to on VL n n r2p en when n is sufficiently large. As explained in the beginning 
of the proof, this is sufficient to conclude the proof of (3.7) and (4.4). 
Let us prove (3.8). Let K\ < l/(2e n ), from Fact 6 and (3.7), we have 

— E(2D At rh) < K n ( E(p(m)) + E (/ 2 (m, ki)) + k^E ' 



n \ \ n 

< (l + 2 Kl e n )E((/ 2 (m, Kl )) + ) + (l + 2K ien )K„E(|| S -SA|| 2 

< 2E ((/ 2 (m, Kl )) + ) + 2K n E(\\s - s A \\ 2 ) 

<2(E((/ 2 (m,/si)) + ) +-) +2Rn. 
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We used that, by definition K n < 1. We conclude the proof with Lemma 5.1. 
In order to get the bound on ^ in Theorem 4.3, we use that, on f2 n ,nf2 pcn , 
(4.4) holds and there exists a constant K\ such that, K\£ n < 1/2 satisfying 



K, 



< 



n 



1 - KiE r 



+ p*{fh)) 



A" 



1 — K\E r 



SA 



< 



inf 

1 — K\£ n meMn 



^ 1 1 2 1 I K l^n Rn - n Pn 
SA,m\\ S S < 3 . 

1 — K\E n n n 



6. Appendix. We present in this section some classical collections of 
models and prove that they satisfy H4. 

Regular Histograms: Let d be an integer and let Sd be the space of func- 
tions t constant on all the intervals ([k/d,(k + l)/d))fc e z- Sd 1S called the 
space of regular histograms with size 1/d. The family (ipk)kez, where, for 
all k in Z, ipk = Vdli^ujk+^/d) * s an orthonormal basis of Sd- Let = 
t G Sd, \\t\\ 2 < l\. From Cauchy-Schwarz inequality, we have 

^ipl = dl R . 



sup t 

t€B d 



Hence 



sup f 

t&B d 



d, P 



sup t 

t£B d 



dP (It 



d. 



H4 holds on all the spaces Sd, therefore, it holds on the collection (Sd)d=i,...,n 
called the regular histograms collection. 

Fourier Spaces: Let k > 1 be an integer and let, for all x in [0, 1], 

i>\,k{ x ) = v / 2cos(27rA;x), ip2,k(x) = V2sm(2irkx), ipo = l[o,i]- 

Let M n = { 1, ...,n} and Vm G M n , let A m = {0, (1, k), (2, k), k = 1, ...,m}. 
The space S m , spanned by the family (ipx)xeA m is called the Fourier space 
with harmonic smaller than m and the collection (S m , m € M n ) is called the 

collection of Fourier spaces. Let B m = j t 6 S m , 

Schwarz inequality, for all x in [0, 1], 



\t\\ 2 < l|- From Cauchy- 



sup t 2 (x) = ^2 ^a( x ) = 1 + 2 ^2(cos 2 (2-Kkx) + sin 2 (27rfcr)) = 1 + 2m. 
teBm AGA m fc=i 

Hence, if P is supported in [0, 1], 



sup t' 

t€zB m 



l + 2m, P 



sup f 

t&Bm 



1 + 2m. 
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H4 holds on the collection of Fourier spaces when P is supported on [0, 1]. 

Wavelet Spaces: Assume that (S m ,m G M n ) is a collection of wavelet spaces 
[W]. Assume moreover that the scaling function <p and the mother wavelet 
ijj satisfy the following relation. There exists a constant K Q > such that, 
for all x in K, 

° fcez ° fcez 

This condition is satisfied by the Haar basis, where cj> = l[o,i)> ^ = l[o,i/2) ~~ 
l[i/2,i), with K Q = 1. Then, for all j > 0, we have 



< Y ^ x ~k)<K ,^<Y tP\Vx ~k)< K . 
Let B rn = it € S m , || 1 1| 2 < 1 1. From Cauchy-Schwarz inequality, we have 

J m 

V m (x) = sup t 2 (x) = Y ^l( x ) = Y 2 ^( 2x ~ k ) + Y 2j Y ^ 2 ( 2jx ~ k )- 





< * m (x) < K 2 + V < 2K 2 



Hence 6^ = W^mW^ < 2K 2 J ™, P(* TO ) > 2 J "VK D . 
H4 holds on the collection [W]. 

SUPPLEMENTARY MATERIAL 

Supplement A: Proofs of Lemmas 5.1 and 5.2 

(http://lib.stat. emu. edu/aos/???/???). In the supplementary material, we 
give complete proofs of the concentrations lemmas 5.1 and 5.2. We use cou- 
pling results respectively of Berbee (1979) and Dedecker and Prieur (2005) 
to build sequences of independent random variables (Aq, — , A*^) approxi- 
mating the sequence of blocks (Aq, A p _i), respectively in the (3 and r mix- 
ing case. We prove concentration lemmas equivalent to 5.1 and 5.2 for these 
approximating random variables. The main tools here are the concentration 
inequalities of Bousquet (2002) and Klein and Rio (2005) for the maximum 
of the empirical process. We prove finally some covariance inequalities to 
evaluate the expectation of p(m) and deduce the rates e n = (Inn)" 1 / 2 . 
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1. Notations. Let (X\, ...,X n ) be real valued random variables identi- 
cally distributed, with common density s with respect to the Lebesgue mea- 
sure. Let p and q are integers such that 2pq = n, and that y / n(lnn) 2 /2 < 
P < y/^{lnn) 2 . For all k = 0,...,p - 1, let I k = (2kq + I,..., (2k + l)q), 
A k = (Xj)j 6 / fc and I = U^Z^I^- For all t in L 2 (fi) and all xi, --^Xq in R, 



1 9 1 P_1 2 

L g (t)(x 1 ,...,x q ) = -Y^t(xi), P A t = -J2L q (t)(A k ) = - Jjt(Xi), 

q i=i p k=o n iei 

v A (t) = {P A -P)(t). 

For all m in M n , we denote by (VoOagA™ an orthonormal basis of S m . The 
estimator $A,m associated to the model S m , is defined as 

SA,m = ^ ( p Aipx)i>x- 
AGA m 

Classical computations show that, if s m denotes the orthogonal projection 
of s onto S m , 



s m = ^ (P^x)^ hence \\s A , m ~ s m \\ 2 = ^ (v A ip 

AGA m AGA m 

For all m, ml in M n , let 

p(m) = \\s m - s A , m \\ 2 = sup {v A {t)) 2 = ^2 (va(iP\)Y 

teBm AGA m 



2 

a; • 
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5(m,rri) = 2v A (s m - s m >). 

Hereafter Wq, ...,W p -\ denotes a resampling scheme, W = P~ l ^\Zo^h 
denotes the resampling empirical process, defined for all measurable 
functions t by 

P-i 

Pft = -Y J W i L q t{A i ). 

p i=0 

We introduce also vj = Pf -WP A and C w = (Var(Wi - W))' 1 . For any 
orthonormal basis (ipx)xeA m of S m , let 

pw(m) = C W ^E W ((z^ A )) 2 ) ■ 
AeA m 

Let e n = (Inn)" 1 / 2 and let k > 0. Let M denote one of the set M n or M 2 n . 
When M = M n , for all m in M let Ra,™, = RA,m and when M = M\, for 
all m = (m, ml) in Ai, let RA,m = RA,m V RA, m '- For all minA4, let 

(1.1) fi(rn, k) = p(m) - 2DA ' m _ K£n - 



n n 



(1.2) / 2 (m, k) = 2j ° A ' m _ p( m ) _ Kg? . ^ 



n n 

(1.3) / 3 (m, k) = p(m) - p w (m) - ne n A,m , 

n 

(1.4) / 4 (m, k) = p w {m) - p(m) - K£ n RA,m , 

n 

l-\ c\ t l — \ zi l\ RA,m V RA,m' 

(1.5) f 5 {m,K) = d{rn,m ) - K£ n 2 — . 

n 

In this paper, we state and prove the following results. 

Lemma 1.1. LetX\, ...,X n be a strictly stationary sequence of real valued 
random variables with common density s and let {Sm)m&M n ^ e a collection 
of regular wavelet spaces [W] satisfying assumptions H3,H4. Let p, q be 
two integers satisfying 2pq = n and ^-^(lnn) 2 < p < ^/n(lnn) 2 . 
Assume that there exists 9 > 5 such that X\, X n are arithmetically [AR(0)] 
T-mixing and satisfy S(r,W). There exist constants K\, k<i, such that, for 
all i = 1, 5, for all m in A4, 



(1.6) E{ sup (fi(m,Ki)) + ) < — . 

m£M / n 
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Lemma 1.2. Let 9 > 1 and let (X n ) n& z be an arithmetically [AR(#)] 
f3-mixing process satisfying S(/3). Let (S m ) m ^M n be a collection of linear 
spaces satisfying Assumptions HI, H2, H3, H4. Letp, q such that 2pq = n, 



This paper is not self-contained. Some notations and assumptions have 
been introduced in the main paper. These lemmas are central in the proof of 
the oracle inequalities in the main paper. We believe that the tools involved 
are interesting by themselves and we group them in three subsections. The 
first one gives technical results on the resampling penalty. The second gives 
coupling results for mixing data and the third one provides the useful con- 
centration inequalities for independent processes. The proof of the Lemmas 
are given respectively in Sections 6.1 and 6.2. 

2. Some results on resampling penalties. 

Lemma 2.1. Let (£a) AeA be a set of real valued square integrable func- 
tions defined on a measurable space (X, X). Let A$, ...,A p -i be H-valued ran- 
dom variables with common law P and let (Wo, W p —i) be a resampling 
scheme of Aq, A p _\. for all t, let 





where 



(1.7) 




p A t = - y^t(Ai), v A t 



(P A -P)t, p(A) = Y,Mtx)) 2 - 



i=0 



AeA 



Let W = p" 1 Y%Zl Wi and C w 



Var(y/ X 



W)~ l . Let 




^(A) = Ea g a (*a - ^a) 2 , D = PT and 
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Then, we have 

(2.1) p(A) = lp A T + ^±U, p w (A) = -(P A T-U), p(A) - p w (A) = U. 
P P P 

When, moreover, Aq, A p -i are i.i.d, 
(2.2) 



E (p w (A)) = -, PwW ~ - = - {v A T - U) . 
p P P 



Proof: In the independent case, it is clear that E(P^T) = D and E(£/) = 
0, thus (2.2) comes from (2.1). We only have to prove the two first equalities, 
the third one being an immediate consequence. 

1 p ~ 1 

p( a ) = -EE(^)-^a) 2 

P i=0 AeA 

+ 4 E T ( (tx(A i )-Pt x )(t x (A j )-Pt x ) = -P A T + P ^U. 
Since Y^i^Wi ~ W) = 0, we have 

(2.3) vj{t x ) = - £(Wi - W)t x (Ai) = - J2(Wi - W)(t x (Ai) - Pt x ). 



i=0 



P 



i=0 



Let E it j = CVE[(Wj - W)(Wj — W)]. Since the weights are exchangeables, 
for all i = 1, ...,n, Ei i = En = 1 and for all i ^ j, Eij = Ea,2- Taking the 
square in (2.3), we obtain that 



p-i 



C w E w (u^(t x )) 2 = ^ E E *dMA) ~ Pt x )(t x (A 3 ) - Pt x ) 
p i,j=0 

= \j2MAi) - Pt x f + % (tx(Ai) - Pt x )(t x (Aj) - Pt x ). 



i=0 



Summing in A, we obtain that 



p w (A) = lp A T+ P —^E l:2 U. 
p p 



Since YiiWi - W) = 
= E 



p-i 

i=0 J 



C^(p + p(p-l)E lt2 ). 



Hence E\ 2 = — (p — 1) , and the proof is concluded. 
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3. Coupling results. The first coupling lemma was obtained in Lerasle 
(2009a), as a consequence of a result from Dedecker and Prieur (2005). 

Lemma 3.1. [r-coupling, Claim 1 pl7 in Lerasle (2009a)] Assume that 
the process (Xi, X n ) is T-mixing and let p, q and Aq, A p -i be respec- 
tively the integers and the random variables defined in Section 1. There exist 
random variables Aq, A*_-^ such that: 

1. for all k = 0, ...,p — 1, A* k = (^fcg+i' • > ^(2k+i)q) has the same law 
as A k , 

2. for all k = 0, ...,p — 1, A* k is independent of Aq, A k _\, Aq, ...,^4£_ 1; 

3. for all k = 0, ...,p - 1, E(d q (A k , A%)) < qr q . 

/3-mixing data satisfy the very important following lemma, which is due 
to Viennet (1997). 

Lemma 3.2. (Lemma 5.1 in Viennet (1997)) Assume that the process 
(Xi, X n ) is (3-mixing and let p, q and Aq, A p _\ be respectively the 
integers and the random variables defined in Section 1. There exist random 
variables Aq, A*_ 1 such that: 

1. for all k = 0, ...,p — 1, A* k = (X2 kq+1 , X^ 2k+l ^ q ) has the same law 
as A k , 

2. for all k = 0, ...,p — 1, A* k is independent of A , A k _i, Aq, A k _ l; 

3. for all k = 0, ...,p - 1, PL4 fc ^ A%) < q . 

For all functionals T = F(A , let T* = F(A* , A*^), where 
the random variables (A k ) are given by the previous coupling lemmas. In 
particular, we will use repeatedly the notations PJJ = ^ Yliel > v* A = P\ — 

p,{PaY = £EEJw*£i67*fc7. {ujy = (pfr -wp* A , l 5*{m, m ') = 

2z/^(s m — s rn >), 

(3.1) p*{m) = {y\^)\Pw(m) = C w ((^a)>a) 2 . 
AeA m AeA m 

1 

Um = ^h^T) ^ £ ( L ^a)(^) ~ PM(L g ^x)(A 3 ) - P^x) 

P{P ' A6A m i^j=0 
1 P ^ 

u *m = E E ( L M( A *) ~ Pi>x){L q {i> x ){A*) - pVa) 

P[P ' AeA m ijij=0 

The first Lemma is a straightforward consequence of Lemma 3.2. 
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Lemma 3.3. Let X\, ...,X n be real valued, stationary, (3-mixing random 
variables. Let p, q be two integers such that 2pq = n and let (S m ) m £M n 
be any collection of spaces of measurable functions. Let Aq, ...,A*_ 1 be the 
independent random variables given by Lemma 3.2. Let p, pw, 5, p* , pyy 
and 5* be the associated functions defined on M n and M? n in (3.1). There 
exists an event with P(fi^) > 1 — p(3 q where, for all m, m' in A4 n , 

p{m) = p*(m), pw(m) = Pw(m), 5(m, m') = 8*(m, m'), U m = U^. 

Proof: Consider the event fin = {^l = 0, ...,p — 1, Ai = AJ}. It comes 
from Viennet's coupling lemma that F(Qn^) > 1 — p/3 q and it is clear that, 
on Qn \ the conclusion of Lemma 3.3 holds. 

Lemma 3.4. Let X±, X n be stationary random variables, real valued, 
t -mixing and with common density s. Let p and q be two integers such 
that 2pq = n and let Aq, A*_ 1 be the random variables given by Lemma 
3.1. Let M. n be a collection of models. Let p, p\y, S, p* , p\y, 5* be the 
associated functions defined on M. n and (A4 n ) 2 in (3.1). Let MC n be the 
mixing complexity of M. n defined by 



MC n = £ 



m.eMn 



xeA„ 



sup Lip^x) + ||s|||.M n | sup Lip(t) 



AeA„ 



Then 



E sup \p(m) -p*(m)\\ < Ar q MC n , 

\m£M n J 

E ( sup \pw(m)-p* w (m)\] < ^MC n , 

\meM n ) V 

El sup 8{m,m') - S*(m,m') J < Ar q MC n . 



(3.2) 
(3.3) 
(3.4) 

Proof: For all m in M n , we have 

El sup \p(m) — p*(m)\ 1 < E (\p{m) — p*(m)|) . 

V^eMn / m£Mr 

Moreover, for all m in A4 n , 
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\p(m) — p*(m)\ 



]T {{P A - P)i> x f - {{P* A - pty A ) s 



AGA„ 



£ ((^ + ^)Va)((Pa-P1)^ 
AeA m 

l p_1 

|p(m)-p*(m)| < ^ K^ + ^aI-EI^^)^)-^^)^ 

^ fc=0 

sup Lip g (Lg ) ) - E d i ( Afc ' ^ ) 



agA 
< 4 



< - 



E i^i 

AGA m 

E I^A 

AeA m 



AeA„ 



fc=0 



sup Lip Oa ) - E d 9 ( ' ^fe ) 

AGA m P ^ 



We take the expectation in this last inequality and we use Lemma 3.1 to 
obtain (3.2). From Lemma 2.1, 

\pw{m) - p* w (m)\ = i | (Pa - Pl)(T m ) - (*7 m - O . 

(Pa — P A )T m is equal to 

1 P ^ 

E - E (W>a)(40 - L q (if> x )(A%)) (L q (i> x )(A k ) + L ff ty A )(A£) - 2PVa) , 

AGA m P k=0 

thus 

p-1 



4 

< - 



E I^A 

AGA m 

E 

AeAm 



1 

sup Up q (Lg(tpx))-y2d q (A k ,A* k ) 
AGA m P fc=Q 

sup Lip(^A)-E d 9(^ fc '^)- 



aga 



fc=0 



Moreover 



U — U* 



1 £=J 

— pr E E MMW-PMMMW-L^xW))) 



pip - 1) 



i^j=0 AGA„ 
p-1 



+ T ^ T y E E (^(V>a(4*)) - PVa)(^(^a(^)) - ^Wa(^))), 

P ^ ' »5*2=0 AGA m 
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thus 



\TJ — TJ* I < _ 



E 1^1 



AeA,- 



sup Lip (-^a)- ^2d q (A k ,A* k ) 

\£Am P ,„_n 



k=0 



Therefore, 



E(\p w (m)-p* w (m)\) < — 



< 



8r„ 



E 1^1 

AeA m 

E \1» 



lP -i 

sup Lip(^A)-E E K(^' j4 fc)) 

AGA m P ~ 



AGA„ 



sup Lip (tp x ) 

AGA m 



k=0 



Thus E (sup mGMn \pw(m) — Pw( m )\) ^ s u PP er bounded by 



£ E(|p^(m)-pfc(m)|)<^ £ 



E 



AeA„ 



sup Lip(/i/>A)- 
AeA m 



^{^Vm,m'eM n S(m,m') - 5*(rn,m')) < ^ fnm , e _ Mn E m') - 5*(m,m')\) 

and, for all m, m! in A^ n , 

E (|J(m,m') - 5*(m,m')\) = 2K(\(P A - PX)(s m - s m ,)|) 
P-i 



< — Lip(s m - s m > 
pq 



^2E(d q (A k ,A* k ))<2T q Up( 



k=0 



For all x, y in R and all m, m' in A4 n , 



(sm - s m ')(x) - (s m - s m >)(y) < \\s\\ sup Lip(t) + sup Lip(t) \x - y\ 

\teB m teB m , J 

Hence, Lip(s m - s m >) < \\s\\ (sup teBm Lip(i) + sup tgBm , Lip(i)), thus 
E sup 5{m, m!) — 5*(m, m') \ < 4r g ||s|| \A4 n \ sup Lip(t). 
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Lemma 3.5. Assume that 2pq = n, that ^/n(lnn) 2 /2 < p < ^/n{in.n) 2 , 
and that there exist constants C and > such that, for all q in N*, (3 q < 
C g -(i+0). T/ien 

(lnn) 4+2e 



n 



6/2 



Proof: The proof is straightforward. 

Lemma 3.6. Assume that there exist constants k_ > 0, k+ > such that 
K-y/n(\nn)~ 2 < q < K + ^/n(hin)~ 2 and assume that, for all q, r q < q~^ 1+e \ 
Let MC n be the mixing complexity defined in Lemma 3.4 for the collection 
of models [W] . Then, there exists a constant k such that 



TqMC n < K 



(Inn 



.2(1+0) 



n (e-3)/2 ■ 



Proof: Let us first recall some basic inequalities that hold in [W]: let 

Koo = (V^||0||oo) V IMloo, K L = (2V2Lip(0)) V Lipty), K BV = AK L . 
Then for all j > 0, we have ||V'j,fc||oo < K OQ 2^ 2 , 



(3.5) 



k&1 



< AK^' 2 , 



(3.6) Lfpfe) < K L 2^' 2 , 

(3-7) Uj,k\\ B v ^ I<B V 2 j/2 . 

Since Card(7W n ) < Inn/ In 2, we obtain that 

([Inn] /In 2 
2 2j ' + (lnn)2 3j '/ 2 ] < 
3=0 



We conclude the proof of Lemma 3.6, saying that q > «_^/ra(lnn) 2 implies 

C (lnn) 2 ( 1+0 ) 



4. Concentration inequalities. The following concentration inequali- 
ties were proved in Lerasle (2009b). These inequalities derive Bousquet's and 
Klein & Rio's versions of Talagrand's concentration inequality for the supre- 
mum of the empirical process (see Bousquet (2002); Klein and Rio (2005)). 
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Theorem 4.1. Let Aq, A p _\ be iid random variables valued in a 
measurable space (X, Af), with common law P. Let S be a symmetric class 
of functions bounded by b. For all t in S, let P^t = p~ l Yl^=o v 2 = 

sup teS P[(t - Pt) 2 ], Z = suv teS (P A - P)t, D = P E(Z 2 ). For all x > and 
all e in (0, 1], with probability larger than 1 — 2e~ x , 



p 



( D I ( v 2 x fbx\ 2 ' 



The constant k = (16(ln 2) 2 + 8) works. 

Proof: We repeat the proof of this theorem for the sake of completeness. 
From Bousquet's and Klein & Rio's bound, we have, for all x > 0, for all 
< e < 1, 



(4.1) P \Z-E(Z)\ > eE(Z) + 



l 2v 2 u 



+ 2e~ 



bx 
p 



< 2e~ x . 



Let then 1F = {|Z-E(Z)| < eE(Z) + yj^- + 2e -1 ^ }. On 
Z 2 < (1 + e) 3 (E(Z)) 2 + 86" 1 — + lee" 3 ^ 2 



n 1 



p 



p 



,2 ' 



Using Cauchy-Schwarz inequality, we have (E(Z)) 2 < E(Z 2 ) = p 1 D, since 
e < 1, (1 + e) 3 < 1 + 7e, thus, on U x , 



D 



Z 2 < (l + 7e)-+8e~ 1 — + 16e 



2^2 



p p 

As 5 is symmetric, Z is positive. Hence, we also have, on Q x , 



Z 2 > (1 - e) 2 (E(Z)) 2 - 2(1 - e)E(Z) K 

4« 2 i 8 6 2 x 2 



'2v 2 x 1 6.x 
+ 26- 1 — 

p p 



> (1 -4e)(E(Z)) 2 



e p e 3 p 2 



From Lemma 4.2 above, we have, for k = 2 + 57r/2, 

(E(Z)) 2 >(l-3e 2 )E(Z 2 )-K--~ 

p e z p z 
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Hence, on Q x , 

o , . . q . / 4x \ v 2 ( 16 8x 2 \ b 2 

Let us say that we can take x > In 2, otherwise, the conclusion of the theorem 
is trivial. Moreover, as e < 1, we have 1 < ((ln2)e) _1 x, hence 

K \ v 2 x ( 16 _ \ b 2 

e 3„2 - 



Z 2 < (1 - 24e)E(Z 2 ) - | 
Let us now prove Lemma 4.2. 



In 2/ ep \(ln2) 2 y e 3 j> 



Lemma 4.2. Let Ao, A p _± be iid random variables valued in a mea- 
surable space (K,X), with common law P. Let S be a symmetric class of 
functions bounded by b. For all t in S, let P^i = X^=o *(^)> v2 = 
sup tgS P[(t — Pt) 2 ], Z = sup teS (PA — P)t. For all e > 0, we have 



5vr \ v 2 16 b 2 



Proof of Lemma 4.2: We have 



roo 

Var(Z) = E (\Z -E(Z)| 2 ) = / P (\Z - E(Z)\ 2 > x) dx 

Jo 

/•oo 

= / P(\Z-E(Z)\ > x)2xdx. 
Jo 

Let now e in (0,1], we have, using the following change of variables x 
eE(Z) + \J + 2e _1 y and the concentration inequality (4.1), 

reE(Z) 

Var(Z) = / P(\Z - E(Z)\> x)2xdx 
Jo 

roo 

+ / P(\Z-E(Z)\ > x)2xdx < (eE(Z)) 2 
!jf e- y ^eE(Z) + 



= (eE(Z)) 2 + eE(Z)W2— / ±—dy + 4E(Z)- e~ y dy 
V V Jo y/U P Jo 

v 2 f°° 4 /u2 6 Z" 00 / _ i \ 

p Jo e\l p p Jo \ V 2 y ) 

8 6 2 Z" 00 _„ , 
+ -3-3 / ye y dy. 
e 2 P 2 Jo 
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Using that J* °° e~ y dy = J °° ye~ y dy = 1 and 



r°° e -y r°° 
/ —=dy = 2 1 yjye~ y dy = y^, 
Jo \fy Jo 



we obtain, using repeatedly the inequality 2ab < a 2 + b 2 



Var(Z) < (AW? + 2 £ E( Z ), /=? 4- 4E(Z)^ + 2^ + ijlft + 

| 2p p p e V 2p p e z p A 

n2 . . 5vr\ 16 6 2 



<3(eE(Z))^+ 2 + — _ + __□ 
2 J p e z p A 



We deduce from this theorem the following concentration inequality for U- 
statistics. 

Corollary 4.3. Lei Aq, ^4 p _i 6e i.i.d random variables valued in a 
measurable space (X,X), with common law P. Let fi be a measure on (X, X) 
and let (t\)\ & A m be a set of functions in L 2 (fi). Let 

B=lt = Y «AtA, Yl a l ^ 1 ( ' D = E ( SU P(*(^l) " Pt ) 2 ) > 

[ AeA m AeA m J ^* eB ' 



u = supP[(t - Pty], 6 = supp|| oc 



Ze£ [/ 6e t/ie following U -statistics 

p-i 

p{p - 1) 



A— ^ - PhKUAj) - Pt x ). 



i^j=0 

For all x > 0, iyz£/i probability larger than 1 — Ae~ x , 

(«7 + Kt + (£)■))• 

where the constant k' can be taken equal to 2(K + 4(ln2) _1 ), /or t/ie constant 
k defined in Theorem 4-1- 

Proof of Corollary 4.3. From Lemma 2.1, we have U = p(p— l)~ l {Z 2 — 
P~ 1 PaT), where 



Z = JJ2 d P A- P)txf = sup(P A -P)t, T = V (t x -Pt x ) 2 = sup(t-Ptf 
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From the same lemma, we also have E(Z 2 ) = p^PT, so that 

(4.2) U = p(p- l)" 1 (V - E(Z 2 ) - ^-{P A - P)T^j . 

From Bernstein's inequality, we have, with probability larger than 1 — 2e~ x , 



VP 3p 



We have 



Moreover 



\T\\ 



2 

sup(i - Pt)\ 
teB 



< 46 2 . 



Var(T(Ai)) < WTW^PT < 4b 2 D. 

Hence, with probability 1 — 2e~ x , 
(4-3) 

kb 2 Dx 4b 2 x „ /2 4\ 6 2 x „ 4 6 2 x 2 



p 3p \e 3 J p (ln2)e p 

Equations (4.2), (4.3) and Theorem 4.1 ensure that, on an event £l x so that 
P((n x ) c ) < 4e~ x , 

\ P e \ P \£P J J J 

where the constant k' can be taken equal to 2(/«+4(ln 2) _1 ), for the constant 
k defined in Theorem 4.1. Finally, let us recall the following consequence of 
Bernstein's inequality. 

Corollary 4.4. Let Aq, A p _\ be i.i.d random variables valued in 
a measurable space (K,X), with common law P. Let [i be a measure on 
and let (ip\)\eA be an orthonormal system in L 2 (/j,). Let L be a linear 
functional defined on L 2 (/j,) and let B = {t = X^AeA a ^^' SagA a}> 
v 2 = sup tgS P[(t — Pt) 2 ], b = sup tgS ||*|| cc • Let u be a linear combination of 
(^a)agA an d V > 0- For all x > 0, 

it \ T/ ii n2 1 /2f 2 x b 2 x 2 \\ _ x 
v A (Lu) > ^-\\u\\ 2 + - + — < e x . 



2 n V p 9p 
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Proof of Corollary 4.4. From Bernstein's inequality, for all x > 0, there 
exists fP satisfying P(fl x ) > 1 — e~ x such that, on Q x , 



u A (Lu) < 2 Y^Mi)h + IHof, 

As L(u/ \\u\\) belongs to B, we have 

Halloo — IMI Var(Lu(j4i)) < ||u|| 2 v 2 . 
Hence, on £l x , for all n > 0, we have 



, . ,, ,, / / v 2 x bx\ 77 . . . . o if v 2 x bx \ 

MLu) < N 2- + - j < 2 N . + _ 2- + _ j 

< ^ ||u|| 2 + - 2 + — - . □ 

2 V \ P 9p / 

Let us introduce here some notations. For all m in M n , let i? m = {t £ 

<Smj Pl| 2 ^ 1}j 



i£ >m = q sup E (V(Ai) - Pt)' 



, b m = sup , 



1/4 



fn(p, <?) = V]nn ( sup 

mGA4n I V 



V 



pRa,t 



Lemma 4.5. Let Aq, ...,A* 1 be i.i.d random variables valued in W 1 , 
with 2pq = n. Let Ai n be a collection of models satisfying Assumptions 
HI, H2 and let (p*(m)) m€Mn , {p* w {m)) m&Mn , {o~*{m,m')) {m ^ m , )&{Mn) 2, 
(DA,m)m£M„> {RA,m)meM n ^ e ^ e associated collections defined in (3.1). 
Let us assume that e n (p,q) is finite. There exist constants k which may vary 
from line to line, and an event fl^ satisfying P(S7.^ 2 ' ) ) > 1 — nn~ 2 such that, 

(2) 

on fl n , for all m, m in A4 n , 



(4.4) 

(4.5) 
(4.6) 



p*(m) 



2D a. 



n 



< Ke n (p,q) 



\p*(m) -p* w {m)\ < Ke n (p,q) 



RA,m 
n 

RA.m 



11 



8*(m,m') < K€ n (p,q) 



RA,m V Ra, 



n 
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Proof of Lemma 4.5. First, remark that, from Cauchy-Schwarz inequal- 



ity 



p (m) = sup {vA[t)) , 



teBr, 



n 



E(p*(m)). 



We apply Theorem 4.1 to the class S = {L q t, t £ -B m }. For all t in 5, 
P[(t — Pt) 2 ] < v Am /q and < 6 m , thus, for all x > 0, with probability 
larger than 1 — 2e~ x , for all e > 



p*(ro) 



71 



< K 



1 



e h - 

n e 



v A,m x 



+ 



I? 



< K 



b m x 
ep 



2 ^2 



By definition of e n (p, i 
for all e > 



.m / 

for all x > 0, with probability larger than 1 — 2e" 



(4.7) 



p*(m) 



2D A:r 



n 



< K 



RA,r 



n 



c 4(P»g) ^ ■ e n(P>g) ^ 



Inn 



3 (Inn) 2 

Taking x = ln(|.M n |n 2 ), e = n e n (p, q), the constant k d being chosen so that, 
for all n, e < 1 and using a union bound yield (4.4). 
From (2.1) applied with (t x )x e A = (L q (ip x )) XeAm , A = A*, 

p*(m) -pw{m) = U^. 

From Corollary 4.3 applied with (£a)asA = (L q (tjjx))xeA m j 101 a ^ x > 0, 
there exists fl x such that P(fF) > 1 — 4e _x and on fP, for all < e < 1, 



I EC I <k'\ e 



2Da> 



ir 



+ 1 ^ 

e n 



e 3 p 2 



By definition of e n (p,q), we deduce that 



(4.8) 



EC < k' 



n 



c | ej(p,g) x | 4(g,g) x 



In ?i 



(Inn) 2 



Taking x = ln(|.M n |n 2 ), e = K e n (p, q), the constant k d being chosen so that, 
for all n, e < 1 and using a union bound yield (4.5). 

In order to prove (4.6), we use Corollary 4.4 to the class S m + S m i, with 
L = L q and the function u = s m — s m i. It comes from Assumption HI that, 
for all t in S m + S m i such that \\t\\ < 1, 



P[(t - Pt) 2 ] < 4-^« m + < m ,), 11*11^ < ^aK + 
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Hence, using Corollary 4.4 and the inequality \\s r 



Sm'll 2 < 2(||s-S m /|| 2 + 



< 2n 1 (i?A,m + RA,m')i there exists a constant k such that, with 



probability larger than 1 — 2e x . 
(4.9) 

v A (s m - s m >)< I - + «r/e^(p, < 
V?7 



i?A,m . Ra 



+ 



n 



n 



+ 



Inn (Inn) 2 



We apply this inequality with r\ 1 = e n (p, q) and x = (2 + 2a m) l n n an d we 
obtain that, with probability larger than 1 — n~ 2 ~ 2aA1 , for all m, m! in A4 n , 



11 



+ 



Ra, 



n 



A union bound concludes the proof. 



Lemma 4.6. Let Aq, ...,A*_i be i.i.d random variables valued in M g , with 
2pq = n. Let M. n be a collection oj 'wavelet spaces [W] and let (j>*(rn)) m £M„> 

{Pw( m ))rn^M n , ($*( m , m '))(m,m')£(M n ) 2 > ( D A,m)m€M„, (RA,m)m.eM„ be the 

associated collections defined in (3.1). Let us assume that e n (p,q) is finite. 
There exist constants K\ , K2 which may vary from line to line such that 




E 



sup [p*{m) 

/ 2DA,m 



2D A , 



n 



E ( sup 



p*(m) - Ki€ n (p,q) 



E 
E 



sup 



\ n 

sup ( p*(m) -p* w (m) - Kie n {p,q) 



rriGMn 

sup (pw(m)-p*(m)-Kie n (p,q) 

m£M n V 

5*(m,m) - Kien(p,q) 



RA,m 

n 

RA,m 

n 

RA,m 
n 

RA,m 



11 



RA,m V Ra, 



11 



< 



< 



< 



< 



K2 

n 

K2 
n 

11 
n 

K2 

n 



<!*. 
n 
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Proof: Let us denote by M. the set Af n or Af 2 and for all mm. M., let 
(4.15) r i {m,K 1 )=p*{m) - ^?H- Kl e n (p, q ) RA > m 



n n 



(4.16) / 2 *(m,«;i) = -p*(m) - Kie n (p, <?)- 



n n 

( 4 - 17 ) /I (™> K i) = P*M - Pw( m ) ~ «1%(P) q)^^-, 

n 

(4.18) / 4 * (m, Ki) = p^(m) - p*(m) - Kie n (p, q)^^ 1 , 



(4.19) /!(m, >ti) = S*(m,m') - nie n (p,q)'- 

n 

Fact 8 For aZH = 1, .., 5, we Ziawe 

E (sup (/*(m, Kl )) + ) < Y, E ((/i"(^.«l))+) = E f° > 

If A4 = Af„, for all m = m in Af, let itU.m = #A,m- If Af = Al 2 , for all 
m = (m, wl) in Af, let Ra,™, = RA,m V Ra,™,'- 

Fact 9 There exist k\, k>2 such that, for all y > 0, /or all i = 1, .., 5, /or a// 
m in Al 

f*(m,Ki) > Kie n {p,q)?^- (y + y 2 )) < -^e~ y . 

n J ra z 

Proof: When i £ {1,2}, Fact 9 follows from (4.7) applied with x = y + 
ln(|Al n |n 2 ), e = K e n (p,q), the constant k d being chosen so that, for all 
n, e < 1. When z E {3,4}, Fact 9 follows from (4.8) applied with x = 
y + ln(|Af„|n 2 ), e = K e n (p,q), with the same constant k d . When i = 5, Fact 
9 follows from (4.9) applied with x = y + ln(|Af 2 |n 2 ), e = K e n (p,q), with 
the same constant k d . 

Fact 10 Assume that there exist am, k such that 

P (/(m) > a™; (x + x 2 )) < Ke~ x . 
There exists a constant k\ such that 

POD 

/ P(/(7Fj) > x)dx < 2>Kajn- 
Jo 

Proof We use the change of variables x = a s (y + y 2 ) in the integral, it 
gives 

poo />oo 

P(/(m) > x)dx < k I e~ v a m (1 + 2y) dy = 2>na m . 
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Facts 8, 9, 10 together give that for all i in {1, .., 5}, 



E 



Ra;, 



[ sup (/;(?»,«!))+) <^pl 



It comes from Lemma 5.2 below that, in the collection [W], for all m in 
Ai n , i?A,m < Kn for some constant k, hence, ^meA-l R-A,m < Kn(lnn) 2 /ln2, 
which conclude the proof. □ 

Let us now explain why our Assumptions ensure that, in both the /3-mixing 
and the r-mixing cases, e n (p,q) < ^(lnn) -1 / 2 . 

5. Computation of e n (p, q). 

5.1. The (3-mixing case. We recall the following inequality for /3- mixing 
processes (see for example Comte and Merlevede (2002) inequalities (6.2) 
and (6.3)). Let (X n ) ng z be /3-mixing data. There exists a function v satis- 
fying, for all p, q in N, 



(5.1) 



E »u with Pu q < p q and P(u p ) < p^(l + If' 1 fa, 



i>i 



i>i 



such that, for all t in L 2 (u), 

Var[E*M J -^Var(i(X0) 



(5.2) 



i=l 



Lemma 5.1. Let 9 > 1 and /ei (X n ) ng z & e « n arithmetically [AR(#)] 
(3-mixing process satisfying S(/3). Le£ S m 6e a linear space satisfying H3, 
H4. Let p,q such that 2pq = n, y/n(lnn) 2 /2 < p < y/nilxin) 2 . We have 

5, 



^b 2 
4 m 



\Sm\\ < DA,m < 2^" 1 



^A,m < Ky, < K- 



(lnn) 2 ' p (Inn) 4 
In particular, e n (p,q) < K(ln?i) -1 / 2 . 

Proof: We have, from inequality (5.2), 



D A ,m- E Var(i/; A (X 1 ) 



AeA,, 



AeA m V y i=i / AeA m 



Var(^(*i)) 



< 4P [ f E ^ 

AeA m 
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Moreover 



E Vax(MXi)) =P\ E ^ h E ( P ^) 2 = P E "Ik 



AeA„ 



AgA^ 



AgA„ 



It comes from Cauchy-Schwarz inequality that ^AeA m V'a = ( su PieB m *) 2 > 
hence, from H4, 



c D b 2 m <P \ E ^ ) < b 2 m . 

AeA m 



Finally, 

p ( " E ^ I < K 

\ AeA m J 

We deduce that 

CD 



p^p ( yj ^ ) < l -p ( yj )+26?„p(, 2 ) . 

\AgA m / \AeA m j 



m -\\°m\\ < D A , m < [ - + 8P(u 2 ) ) b 2 m -\\s ml 



From (5.1) and S(/3), we have 8P {v 2 ) = 16^>i(l + < c D /A. As 
RA,m > K R (lnn) 8 



b 2 m <—(D Am + \\s m \\ 2 )<± 1+ 11 



4 

CD 



CD 



Ra, 



Let us then denote by Kqo = [ 1 + 



) , we have obtained that < 



KooRA,m- Our definition of p and q yields then 

lb m ^ Ra .in 



We also have 



p (Inn) 



«A,m = 1 SU P - P*) 2 ]- 



Let i in S m , we have, from inequality (5.2) and S(/3) 

Var (l E t(Xi)j < -P ( (1 + 4z^ 2 ) < 1 ||t 1^ (i>|t| + 4v / JvW ) 



26^ 
9 



00 p 1 /* ' 



20 



M. LERASLE 



Since R n > K^(lnn) 8 , we deduce that 

v 



Ra 



A,m . 2y ||s||Koo 

< — (inn) 



,V4 



5.2. The T-mixing case. 

Lemma 5.2. Let c T be the constant defined in Lemma 5.3 and letC(cp, ip) = 
cdk 2 10 /(8c t AK 00 Kbv)- Let 9 > 1 and let (X n ) n< =z be an arithmetically 
[AR(#)] T-mixing process satisfying S(r,W). Let S m be a linear space in the 
collection [W] satisfying Assumptions H3, H4. Let p,q such that 2pq = n, 
■y/n(lnn) 2 /2 < p < y/n{kin) 2 . We have 



(5.3) ££^x> 2 J " 



,\\ 2 <D Am <lAKl + C -£^)2 J " 



VA,m < K 



RA,m qb 2 



Ra, 



(Inn) 2 ' p (Inn) 4 
Ln particular, e n (p,q) < K(lnn)" 1 / 2 . 

Proof: 

Control of b^: We use the following elementary inequalities: 



E v& 

0,fc)GAm 

E ^ 

0',fc)eAm 



2 r\J n 



<AK^2 



> max ||z/> 2 J| > K 2 2 Jm . 



We obtain that 
(5.4) 



k 2 2 Jm <bt< AKt2 Jm . 



2 i)J 17 



Control ofv\ m : Let t be a function in B m . First, we use a simple bound 

We use the coupling lemma obtained in Dedecker and Prieur (2005) Section 
7.1, to define random variables X* in dependent of X±, such that 



E(|*,-Xf|)<7i_i. 
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ICovMXi),^))! = |Cov(t(Xi),t(Xj) -t(Xf))| 



< JVar(t(X!))E [(^)-*(*f)) 2 ] 



< ^Var(t(X 1 ))26 m E [|t(X,) - t(X*)|] < v / Var(t(X 1 ))26 m Lip(t)r z _ 1 
Moreover, let a^fc = J* R tipj^d/i, then 

Lrp(i) = sup ■ L - 1 -^ < } sup } | a 7 - I - 1 — ^ — i r 

x*,€R ^^ySK^ ' \ x ~ U\ 

J m 

(5.5) < 2AK L V2 3i / 2 sup|a ife |. 

The last inequality holds since, for all x, y in R there is less than indices 
in Z such that l^fcOc)— V^fcG/)! / 0. Since t belongs to B m , Y^(j,k)eA m a i,fe - 
1, in particular, for all j, sup^gg \a>j,k\ < 1- Thus, there exists a constant c 
such that Lip(t) < c2 iJm / 2 . Hence, there exists a constant c such that, for 
all t in B m and all / in N* 

|Cov(ipri),tOX,))| < c2 5J ™/ 4 v ^ZT. 

Remark that we also have 

|Gov(t(Xi),t(X,))| < PIL ||*|||N| < c2 J -/ 2 . 

Let ii = 3/(1 + 0), there exist constants c, which may vary from line to line 
such that 



^ICovWXO.tC*,))! < c2 J -/ 2 £(2 3J -/ 4 ^TT A 1) 

oo 

< c2 Jm/2 ^(2 3J -/ 4 /^( 1 + e )/ 2 a 1) 
l=i 

/ 2 uJ ™/ 2 oo \ 

<c2 Jm/2 V 1+ 2 3Jm/4 r (1+e)/2 <c2 jZ ? (1+n) . 



i = l l = 2 uJ m/ 2 

Since # > 5, u < 1/2 and we have obtained that 
(5.6) v% m < k2 3J -/ 4 . 

Control of D^ m : Let us recall the following lemma, obtained in Lerasle 
(2009a) as a consequence of the covariance inequality for r-mixing sequences 
proved by Dedecker and Prieur (2005). 
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Lemma 5.3. Let X, Y be two identically distributed real valued random 
variables, with common density s in L 2 (ii). There exist a constant c T and a 
random variable v(a(X),Y) satisfying 

E(u(a(X),Y)) = c r \\ S f 3 (rHX), Y)) 1 / 3 

such that, for all Lipschitz functions g and all h in BV 

(5.7) \Cov(g(X),h(Y))\ < \\h\\ BV E (\g(X)\u(a(X), Y)) 

<Cr\\h\\ BV \\g\U\s\\ 2/3 (r(.(I),7)) 1/3 . 

It comes from this Lemma and inequalities (3.5, 3.6, 3.7) that 



\D A ,m- E Var(V» A (X 1 ))| 
(i,fc)eA m 

^ 2 E J2^+ 1 - l )\ Cov ^(x 1 )^ jtk (x l ))\ 

(j,k)eA m 1=2 
2 Jm q 

^ - E E E ii^fciisv E (\^A x i)H°( x i), x i)) 

^ j=0 keZ 1=2 



<2c T K BV \\s\\ 2 / 3 ^/ 2 

3=0 



'j,k\ 



oo 



1=2 



< 4 ( c T A Koo K BV Wsf^l" * Jm < CJ ^2 J ™. 



i=i 



The last inequality comes from S(r,W). Since Yl(j k)eA m {P^j^i^i))' 

2 

s m , we deduce that 



2 J ™<D A , m -P\ ]T ^? fc (Xi) 



I ||2 ^ Cg^OO ryj„ 

\&rn\\ ^ „ ^ 



We always have P ( 
From H4, 



P 



(j,fc)eA m 



(j,fc)eA m 

(j,k)eA m ^,k( x i)) = P {^VteB m t 2 ) < b 2 m < AK*^ 

2 o J n 



PROOF OF LEMMAS 5.1 AND 5.2. 
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We obtain finally 

(5-8) £^2 J » - \\s m f < D Am < (akI + 2 J - - \\s m f . 

Conclusion of the proof: It comes from inequalities (5.6) and (5.8) that 



2 - < K(D A , m + |N| 2 ) 3 / 4 < K(R A , m + IHI 2 ) 3 / 4 



(71 



As RA,m > K i?(hm) 8 > this implies that 



2 ^ RA,m , Ra, 



R l J 4 ~ (Inn) 2 " 
Now, from inequalities (5.4) and (5.8), we have 

b 2 m < K(D A ,m + \\s\\ 2 ) < k'R AiT 
Hence, our choice of p and q ensures that 

l^m ^ Ra,™ 



p (Inn) 2 
6. Conclusion of the proofs. 

6.1. Proof of Lemma 1.1. Let us use the notations f*(rn, ki) defined in 
(4.15-4.19). For all i = 1,..,5, 

E I sup (/i(m,Ki)) + 

\m£M 

< E ( sup \fi{m, Kl ) - f*(m, Kl )\) +E ( sup (/*(m, Kl )) + ) . 
It comes from Lemma 3.4 that, for all i = 1, 5, 



E sup |/i(?n,ASi) asi)| <(4V8/p)r,MC n . 

\m£M J 

From Lemma 3.6 that r q MC n < ku" 1 since 9 > 5. Moreover, Lemmas 4.6 
and 5.2 ensure that E (sup^g^ (f*(m, ki)) + ) < ^. 
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6.2. Proof of Lemma 1.2. For all i = 1, 5 and all m in M. , let f*(fn), 
fi(m) be the random variables defined in (4.15-4.19) and (1.1-1.5) for k\ = 
0. Let Q* be the set defined by Lemma 3.3. From Lemma 3.5, we have 
P((n*) c ) < K 2 (ln?i) 2 ( 1+e )n- 9 / 2 and, on 0*, 

Vi = i,...,5, VmeM, \fi(m)\ = \f*(m)\. 

From Lemmas 4.5 and 5.1, there exists a set 0*, with P (£2^ ) < nn~ 2 such 
that, on f2*, 

Vt = l,...,5, VmGM, < Kie„^5E. 

n 

Hence, on f2* n O*, the conclusion of the lemma holds. 
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